Search CORE

38 research outputs found

Ultimate Intelligence Part I: Physical Completeness and Objectivity of Induction

Author: AD Tubbs
CH Bennett
D Deutsch
I Wood
M Hutter
P Sunehag
PH Frampton
RJ Solomonoff
RJ Solomonoff
RJ Solomonoff
RJ Solomonoff
S Lloyd
S Lloyd
Publication venue
Publication date: 09/04/2015
Field of study

We propose that Solomonoff induction is complete in the physical sense via several strong physical arguments. We also argue that Solomonoff induction is fully applicable to quantum mechanics. We show how to choose an objective reference machine for universal induction by defining a physical message complexity and physical message probability, and argue that this choice dissolves some well-known objections to universal induction. We also introduce many more variants of physical message complexity based on energy and action, and discuss the ramifications of our proposals.Comment: Under review at AGI-2015 conference. An early draft was submitted to ALT-2014. This paper is now being split into two papers, one philosophical, and one more technical. We intend that all installments of the paper series will be on the arxi

arXiv.org e-Print Archive

Crossref

Extreme State Aggregation Beyond MDPs

Author: A.L. Strehl
I. Fazekas
M. Hutter
M. Hutter
M.L. Puterman
O.-A. Maillard
P. Nguyen
P. Nguyen
P. Sunehag
R. Givan
R.S. Sutton
S.J. Russell
T. Jaksch
T. Lattimore
T. Lattimore
T. Lattimote
V. Vovk
Publication venue
Publication date: 01/01/2014
Field of study

We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem

arXiv.org e-Print Archive

Crossref

The Australian National University

Reinforcement Learning Agents acquire Flocking and Symbiotic Behaviour in Simulated Ecosystems

Author: Eccles T
Graepel T
Heess N
Hughes E
Leibo JZ
Lever G
Liu S
Merel J
Sunehag P
Publication venue: Conference on Artificial Life (ALIFE) - How Can Artificial Life Help Solve Societal Challenges?
Publication date: 01/01/2019
Field of study

In nature, group behaviours such as flocking as well as cross-species symbiotic partnerships are observed in vastly different forms and circumstances. We hypothesize that such strategies can arise in response to generic predator-prey pressures in a spatial environment with range-limited sensation and action. We evaluate whether these forms of coordination can emerge by independent multi-agent reinforcement learning in simple multiple-species ecosystems. In contrast to prior work, we avoid hand-crafted shaping rewards, specific actions, or dynamics that would directly encourage coordination across agents. Instead we test whether coordination emerges as a consequence of adaptation without encouraging these specific forms of coordination, which only has indirect benefit. Our simulated ecosystems consist of a generic food chain involving three trophic levels: apex predator, mid-level predator, and prey. We conduct experiments on two different platforms, a 3D physics engine with tens of agents as well as in a 2D grid world with up to thousands. The results clearly confirm our hypothesis and show substantial coordination both within and across species. To obtain these results, we leverage and adapt recent advances in deep reinforcement learning within an ecosystem training protocol featuring homogeneous groups of independent agents from different species (sets of policies), acting in many different random combinations in parallel habitats. The policies utilize neural network architectures that are invariant to agent individuality but not type (species) and that generalize across varying numbers of observed other agents. While the emergence of complexity in artificial ecosystems have long been studied in the artificial life community, the focus has been more on individual complexity and genetic algorithms or explicit modelling, and less on group complexity and reinforcement learning emphasized in this article. Unlike what the name and intuition suggests, reinforcement learning adapts over evolutionary history rather than a life-time and is here addressing the sequential optimization of fitness that is usually approached by genetic algorithms in the artificial life community. We utilize a shift from procedures to objectives, allowing us to bring new powerful machinery to bare, and we see emergence of complex behaviour from a sequence of simple optimization problems

Crossref

UCL Discovery

Bayesian reinforcement learning with exploration

Author: E. Even-Dar
I. Szita
K. Dyagilev
L. Orseau
M. Hutter
M. Hutter
M. Hutter
M. Kearns
M.G. Azar
P. Auer
P. Sunehag
S. Mannor
T. Lattimore
T. Lattimore
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case

Crossref

The Australian National University

Intelligence as inference or forcing Occam on the world

Author: A. Dempster
G. Hinton
H. Shteingart
J. Schmidhuber
L. Orseau
M. Botvinick
M. Hutter
M.J. West-Eberhard
N. Fremaux
P. Dayan
P. Sunehag
R.J. Herrnstein
S. Legg
S.J. Russell
Y. Loewenstein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We propose to perform the optimization task of Universal Artificial Intelligence (UAI) through learning a reference machine on which good programs are short. Further, we also acknowledge that the choice of reference machine that the UAI objective is based on is arbitrary and, therefore, we learn a suitable machine for the environment we are in. This is based on viewing Occam’s razor as an imperative instead of as a proposition about the world. Since this principle cannot be true for all reference machines, we need to find a machine that makes the principle true. We both want good policies and the environment to have short implementations on the machine. Such a machine is learnt iteratively through a procedure that generalizes the principle underlying the Expectation-Maximization algorithm

Crossref

The Australian National University

Melting Pot 2.0

Author: Agapiou John P.
Comanescu Ramona
Duéñez-Guzmán Edgar A.
Haas Julia
Johanson Michael B.
Kopparapu Kavya
Köster Raphael
Leibo Joel Z.
Madhushani Udari
Mao Yiran
Matyas Jayd
Mobbs Dean
Mordatch Igor
Singh Sukhdeep
Strouse DJ
Sunehag Peter
Vezhnevets Alexander Sasha
Publication venue
Publication date: 21/01/2023
Field of study

Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.Comment: 59 pages, 54 figures. arXiv admin note: text overlap with arXiv:2107.0685

arXiv.org e-Print Archive

Real method of interpolation on subcouples of codimension one

Author: P. Sunehag
S. V. Astashkin
Publication venue: 'Institute of Mathematics, Polish Academy of Sciences'
Publication date: 01/01/2008
Field of study

Crossref

Real method of interpolation on subcouples of codimension one

Author: Astashkin S. V.
Sunehag P.
Publication venue: 'Institute of Mathematics, Polish Academy of Sciences'
Publication date: 24/02/2016
Field of study

We find necessary and sufficient conditions under which the norms of the interpolation spaces (N0,N1)θ,q and (X0,X1)θ,q are equivalent on N, where N is the kernel of a nonzero functional ψ∈(X0∩X1)∗ and Ni is the normed space N with the norm inherited from Xi (i=0,1). Our proof is based on reducing the problem to its partial case studied by Ivanov and Kalton, where ψ is bounded on one of the endpoint spaces. As an application we completely resolve the problem of when the range of the operator Tθ=S−2θI (S denotes the shift operator and I the identity) is closed in any ℓp(μ), where the weight μ=(μn)n∈Z satisfies the inequalities μn≤μn+1≤2μn (n∈Z)

The Australian National University